57 research outputs found
External Validity: From Do-Calculus to Transportability Across Populations
The generalizability of empirical findings to new environments, settings or
populations, often called "external validity," is essential in most scientific
explorations. This paper treats a particular problem of generalizability,
called "transportability," defined as a license to transfer causal effects
learned in experimental studies to a new population, in which only
observational studies can be conducted. We introduce a formal representation
called "selection diagrams" for expressing knowledge about differences and
commonalities between populations of interest and, using this representation,
we reduce questions of transportability to symbolic derivations in the
do-calculus. This reduction yields graph-based procedures for deciding, prior
to observing any data, whether causal effects in the target population can be
inferred from experimental findings in the study population. When the answer is
affirmative, the procedures identify what experimental and observational
findings need be obtained from the two populations, and how they can be
combined to ensure bias-free transport.Comment: Published in at http://dx.doi.org/10.1214/14-STS486 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org). arXiv admin note: text overlap with
arXiv:1312.748
Reconciling Predictive and Statistical Parity: A Causal Approach
Since the rise of fair machine learning as a critical field of inquiry, many
different notions on how to quantify and measure discrimination have been
proposed in the literature. Some of these notions, however, were shown to be
mutually incompatible. Such findings make it appear that numerous different
kinds of fairness exist, thereby making a consensus on the appropriate measure
of fairness harder to reach, hindering the applications of these tools in
practice. In this paper, we investigate one of these key impossibility results
that relates the notions of statistical and predictive parity. Specifically, we
derive a new causal decomposition formula for the fairness measures associated
with predictive parity, and obtain a novel insight into how this criterion is
related to statistical parity through the legal doctrines of disparate
treatment, disparate impact, and the notion of business necessity. Our results
show that through a more careful causal analysis, the notions of statistical
and predictive parity are not really mutually exclusive, but complementary and
spanning a spectrum of fairness notions through the concept of business
necessity. Finally, we demonstrate the importance of our findings on a
real-world example
Causal Inference and Data-Fusion in Econometrics
Learning about cause and effect is arguably the main goal in applied
econometrics. In practice, the validity of these causal inferences is
contingent on a number of critical assumptions regarding the type of data that
has been collected and the substantive knowledge that is available. For
instance, unobserved confounding factors threaten the internal validity of
estimates, data availability is often limited to non-random, selection-biased
samples, causal effects need to be learned from surrogate experiments with
imperfect compliance, and causal knowledge has to be extrapolated across
structurally heterogeneous populations. A powerful causal inference framework
is required to tackle these challenges, which plague most data analysis to
varying degrees. Building on the structural approach to causality introduced by
Haavelmo (1943) and the graph-theoretic framework proposed by Pearl (1995), the
artificial intelligence (AI) literature has developed a wide array of
techniques for causal learning that allow to leverage information from various
imperfect, heterogeneous, and biased data sources (Bareinboim and Pearl, 2016).
In this paper, we discuss recent advances in this literature that have the
potential to contribute to econometric methodology along three dimensions.
First, they provide a unified and comprehensive framework for causal inference,
in which the aforementioned problems can be addressed in full generality.
Second, due to their origin in AI, they come together with sound, efficient,
and complete algorithmic criteria for automatization of the corresponding
identification task. And third, because of the nonparametric description of
structural models that graph-theoretic approaches build on, they combine the
strengths of both structural econometrics as well as the potential outcomes
framework, and thus offer a perfect middle ground between these two competing
literature streams.Comment: Abstract change
A General Algorithm for Deciding Transportability of Experimental Results
Generalizing empirical findings to new environments, settings, or populations
is essential in most scientific explorations. This article treats a particular
problem of generalizability, called "transportability", defined as a license to
transfer information learned in experimental studies to a different population,
on which only observational studies can be conducted. Given a set of
assumptions concerning commonalities and differences between the two
populations, Pearl and Bareinboim (2011) derived sufficient conditions that
permit such transfer to take place. This article summarizes their findings and
supplements them with an effective procedure for deciding when and how
transportability is feasible. It establishes a necessary and sufficient
condition for deciding when causal effects in the target population are
estimable from both the statistical information available and the causal
information transferred from the experiments. The article further provides a
complete algorithm for computing the transport formula, that is, a way of
combining observational and experimental information to synthesize bias-free
estimate of the desired causal relation. Finally, the article examines the
differences between transportability and other variants of generalizability
Causal Fairness for Outcome Control
As society transitions towards an AI-based decision-making infrastructure, an
ever-increasing number of decisions once under control of humans are now
delegated to automated systems. Even though such developments make various
parts of society more efficient, a large body of evidence suggests that a great
deal of care needs to be taken to make such automated decision-making systems
fair and equitable, namely, taking into account sensitive attributes such as
gender, race, and religion. In this paper, we study a specific decision-making
task called outcome control in which an automated system aims to optimize an
outcome variable while being fair and equitable. The interest in such a
setting ranges from interventions related to criminal justice and welfare, all
the way to clinical decision-making and public health. In this paper, we first
analyze through causal lenses the notion of benefit, which captures how much a
specific individual would benefit from a positive decision, counterfactually
speaking, when contrasted with an alternative, negative one. We introduce the
notion of benefit fairness, which can be seen as the minimal fairness
requirement in decision-making, and develop an algorithm for satisfying it. We
then note that the benefit itself may be influenced by the protected attribute,
and propose causal tools which can be used to analyze this. Finally, if some of
the variations of the protected attribute in the benefit are considered as
discriminatory, the notion of benefit fairness may need to be strengthened,
which leads us to articulating a notion of causal benefit fairness. Using this
notion, we develop a new optimization procedure capable of maximizing while
ascertaining causal fairness in the decision process
A Causal Framework for Decomposing Spurious Variations
One of the fundamental challenges found throughout the data sciences is to
explain why things happen in specific ways, or through which mechanisms a
certain variable exerts influences over another variable . In statistics
and machine learning, significant efforts have been put into developing
machinery to estimate correlations across variables efficiently. In causal
inference, a large body of literature is concerned with the decomposition of
causal effects under the rubric of mediation analysis. However, many variations
are spurious in nature, including different phenomena throughout the applied
sciences. Despite the statistical power to estimate correlations and the
identification power to decompose causal effects, there is still little
understanding of the properties of spurious associations and how they can be
decomposed in terms of the underlying causal mechanisms. In this manuscript, we
develop formal tools for decomposing spurious variations in both Markovian and
Semi-Markovian models. We prove the first results that allow a non-parametric
decomposition of spurious effects and provide sufficient conditions for the
identification of such decompositions. The described approach has several
applications, ranging from explainable and fair AI to questions in epidemiology
and medicine, and we empirically demonstrate its use on a real-world dataset
- …